NOV 22, 2025

42 MIN

Ep. #1, Introducing Data Renegades

light mode

about the episode

In this debut episode of Data Renegades, CL Kao and Dori Wilson dive into the unconventional paths that led them into developer tools, civic tech, and machine learning systems. CL recounts his experience building early open source infrastructure, mobilizing communities through data transparency projects, and now shaping how we design software around LLMs. This episode sets the tone for the show’s mission: celebrating builders who redefine the boundaries of what tools can do.

show notes

Recce
Apache HTTP Server
Git (context for “pre-Git” decentralized version control systems)
g0v (Taiwan civic tech community)
OpenSecrets
PostgreSQL
PostgREST
Supabase
dbt (Data Build Tool)
SQLMesh
Metabase
Looker
GitHub Actions

about the episode

about the guests

show notes

transcript

CL Kao: Welcome to Data Renegades where we interview a bunch of renegades who at that time were probably starting off some new framework or new abstraction that profoundly impacted our industry. We wanted to bring this experience and this thought leadership to the audience, to inspire people to build more things or test or advance the whole industry.

Dori Wilson: But today is going to be a little more self-serving than that as it's our first podcast that CL and I are recording. So we're going to be talking to each other, maybe a little bit more of a self -serving of us being renegades for ourselves, but you know, starting somewhere.

CL, why don't you start off by giving a little more bio about yourself so our listeners know why they should care and tune in.

CL: Yeah, of course. I grew up as a competitive programmer and got into open source back in the day, like building web applications, contributing to Apache, the web server, which was the most popular one at a time. And then for the past 20 years I've been doing mostly developer infrastructure tooling.

I worked on one of the earliest decentralized version control systems that predated Git, that was used by Apple and a couple of big companies at a time. It was open source, we didn't really commercialize that but we were playing in the frontier of how developer experience should be in the modern days.

And then if you were old enough to use that old version control system, it's very hard for a non-contributor to work with an open source project. So that's when I felt the pain myself and decided that we needed a better tool that allows more people to contribute to software in a more ergonomic way.

So from there I got involved in a pretty global open source community. I moved to the UK to work for a couple of years and then, oh where was I?

Dori: You started off with the move to Europe which I always thought was such an interesting part of your story. Especially now, you know we both live here in San Francisco in the Bay and it always seems when you introduce somebody to me, you're like, "Oh yeah, I met them in London. Oh yeah, we jammed on this over in Germany." It's really cool.

CL: Yeah. So actually the move to the US last year was actually with a lot of help from my old friends at that time when we meet around the world, just hacking on open source stuff.

But fast forward to why I'm doing the data related things or tooling today. In 2012 I started a civic tech community back in Taiwan, trying to see if we could adopt the things that we learned from open source community to organize people, not just with a technical background but interdisciplinary, that cares about democracy and how modern society should function through the help of data and technology.

And so from there, the first project was actually like budget visualization because understanding budget is a way that you participate in democracy. And at the time, of course, there's published PDFs but it's really hard to comprehend if you don't know what the terminologies are.

Dori: As someone who's worked on government PDFs when I was at the SF Fed, I know exactly what you mean.

CL: Yeah. And then how do you engage the general public with that. Right? So you got to turn the data into something so people can feel that they're connected to the data. What does it mean? Does it connect to a local improvement project that's in my neighborhood?

So that was that initial project and the civic tech community and then that's where I started to work on a lot of budget data or legislation data. And at that time of course there's no LLM yet. So processing all of that data was kind of pretty tedious. And then we worked using pretty traditional ad hoc data pipelines and I was already feeling...

Dori: Wait. At the risk of aging you, ground us. What year was this?

CL: So it's 2012. Okay, so working with all this data, usually you have some ad hoc script trying to transform the data in one way or another. So I was exploring if there was a better way to do it at that time.

So I created a Postgres extension that essentially it's doing data modeling within Postgres that turns relational data into a rest server within Postgres. So at the time it was called PgREST. And then I think there's another version of that nowadays being supported by Supabase.

Dori: That is all awesome. It's very technical. So why don't you explain that, a little more in layman terms about what were the benefits of doing that method over others at that time in 2012.

CL: Okay. So in the old days of working with data from web applications, we often tell each other that SaaS is basically a database wrapper. Right? You have all this CRUD, as in create, replace and update and delete functions wrapping around a database.

So traditionally in web development or software you have something called ORM, object relationship modeling, that you turn the records in database into a representation in the software stack that you can manipulate.

And often that works in a client server way that the browser talks to a web server that operates on certain records. And then so you can see how you as an end user use a web browser to talk to a certain backend that talks to a database. This is a very traditional kind of web app.

So by combining that ORM layer, which is the server side software understanding of the data, and the database together, that's kind of reducing that very thin layer of wrapping and unnecessary code at a time.

So it was like exploring if that's a good practice where you have data already modeled in the database and then you define some mapping that turns into a web server, some REST response that you can have the front end operate on in the database directly.

Dori: So in this case it would be you're taking the PDFs, you're transitioning them into the database and then making it serviceable where people can then just hit a REST API for a different web app in order to do that?

CL: Yeah.

So that's the time where I'm started to feel maybe there's a new, better way to do data. It shouldn't be so many ad hoc scripts maintaining the data or ingesting new data.

And of course fast forward we have the so called modern data stack. You have scalable warehouses that separate compute and storage and you have all this tooling for ingestion, transformation, and then modern BI tools.

But when we think about it, the development cycle is still pretty brittle. As in, you model something, now it's great that you have a code-first way such as DBT or SQLmesh. Code-first meaning that it's declarative, you can finally put it in CI/CD and test it in a way that generates a separate database and then compare results.

So that is a great advance in the ergonomics for data development already. So what's missing here is really how do you know that thing your new logic produced is actually correct? And oftentimes I learned from working with civic tech--

Data is just a bunch of facts. How they are meaningful or how they're interpreted, it really depends on someone with domain knowledge that knows what this means or what that has an impact about.

Dori: Yeah. And in the case of gov0 it's not just, "what impact does it have?" It's then helping people translate to, "why does this matter to me?" It's making it personal so then they're really getting engaged.

CL: Exactly.

Dori: Yeah.

CL: The gov0 era is really thinking about how do we empower more, either they're a technical background person not working with so much data, or someone who's from a particular NGO that knows a certain issue and then wants to equip themselves with use of data to be more convincing to their audience. So that community has grown into 13,000 on Slack and working on hundreds of projects.

Dori: How many corrupt politicians have they caught at this point?

CL: Yeah, there was a pretty prominent project doing campaign finance digitization. C ampaign finance was a very important part of how money impacts politics. Right? So it used to be on paper or actually they have digital records but they don't publish that and you have to go into a particular building to print it out.

So there was a gov0 project that mobilized 30 people to go in there every day and then just print it out, go to the print store next door and put it on Google Drive. Right? And then we didn't do an OCR at that time, but instead our friend Ronnie turned it into a CAPTCHA GAN as in tabular data and then you can do some really basic image processing and turn to per cell into a CAPTCHA GAN that calls people that, "hey, just help save democracy by digitizing this thing."

So it's not that we can't do ocr, but this creates immense engagement from the public.

Dori: It's community building at the same time as loading the data.

CL: Yeah. So there are like 10,000 people trying to fix democracy by typing numbers. And then within 24 hours we'd clear like 300,000 records for the first batch.

Dori: How did you get all those initial people engaged? I mean this is something that we're thinking a lot about internally at Recce too is how do we build community?

CL: Yeah.

Dori: And this is an incredible community. I mean very different message. "Save democracy"versus " make sure your data is validated and is accurate in context."

CL: Yeah, that's a great question.

When we started it was a tiny hackathon. We were thinking maybe 30 people would care about this and then just suddenly there's over 100 people every time. People wanted to bring their struggle, bring their passion issue to the community because they know they will find someone there and then collaboratively solve something.

I got a lot of inspiration from the open source community that people go there and then build things. It's not just thinking about potential solutions, it's like start prototyping things that made that a very unique community. Lots of great people.

Even if they have very different point of view, they can collaborate on the facts that it's kind of in between these different points of view. But from that learning, there's a manifesto we work on pretty early. Basically don't ask why something hasn't been done because you're the reason that it hasn't been done.

Dori: Because you're the one to do it. Go out and build.

CL: Yeah, yeah, yeah. And then it's embracing this open source way that we ask people to come share about what you care about and then you don't have to commit your full life into changing politics, but this is the way you contribute. And then by the way the open source works, this accumulates. If we produce artifacts from time to time and improve upon each other's artifacts.

Dori: Yeah. And in a non partisan way too because this is just making sure funds are appropriated that people understand what the use is too. So you can really bring people in and find that common ground too.

CL: Yeah, yeah. And there are a lot of passion projects about illegal factory on farmland. Can we have a crowdsource way reporting that and then so NGOs step in to be kind of the reporting entity that actually report this illegal building to the government and then track them. Right? So it's kind of a very interesting multi stakeholder way that people want to solve problems and then here they assemble and then. Yeah.

Dori: Before we get into kind of the next chapter, if people want to know more about gov0, where can they go find out?

CL: So there, there's a website called g0v.tw is like g0v tw and I believe there are like English on that as well. So yeah.

Dori: Cool. All right, now into the next chapter. Okay, so we have talked about how you first really got into data. Your background with competitive programming, building out version control pre-git, which as somebody who has only ever used git for my version control, I literally cannot imagine.

When we were first talking about it, I just had not even thought about, "oh, of course there'd be other version control systems like." So how'd you get into Recce? What was that next step like after gov0?

CL: Prior to Recce there was another startup I started that's doing machine learning operations. And then at that time was a previous GPU boom where people are starting buying GPU and then trying to apply deep learning to various traditional ML problems.

And then the product we built is like a data science workbench for a mid-sized team that can share the environment, make sure they have the same version of library, allocate GPU and all that. And then the product was used by some of the largest companies in the world and then some banks in Taiwan.

So that's also on a mission where I was thinking the software engineering practice at that time, like applying to data science or machine learning, it still pretty, there's a gap in there. Like you, you probably know that people will like the data science will throw code to MLE to kind of re-implement the whole thing and then when it's in production everything is monitored differently.

Dori: MLE being machine learning engineers.

CL: Yeah, so that was an attempt to solve that problem. But we found out that the start was like solving too big a problem that spread us very thin because you know the entire machine learning life cycle is like well there's data, data, you have ingestion, cleaning.

Dori: There's problems everywhere, at every step of that issue.

CL: Exactly.

Dori: So a classic founder trap of, "we have a big vision, we have multiple problems and we want to solve all of them."

CL: Yeah. So I think that was the time where I found, okay, because it started very naive, like okay, the development experience for this new type of software that we probably called software 2.0 at that time. Now in the agentic world it's software 3.0. What's the new collaboration abstraction for software 2.0?

And then we talked about Git and then GitHub. Right? But if you think about Git started 2005 and GitHub in 2008. And then it is very interesting progression as in there's new abstraction that's created core to our current developer cycle.

As in , for example pull requests. Before pull requests you basically have very tedious way to do branching and merging and then pull requests enable that. But more importantly it's an abstraction that enabled the devops practice to be not just a best practice but ready products can be integrated into your workflow like Circle CI or all the different CI/CD tools.

So that important abstraction is I think crucial to the current developer workflow for traditional software. And then we're missing that abstraction for the new type of software that we're building for 3.0.

Dori: Okay, so we have 2.0. Version control is really out there. DevOps has now grown. For the software 3.0, what are we missing? What is the ops? Is it ops?

CL: Yeah. All right, let's first maybe define this a little bit. Like when we say software 1.0 it's like you basically code up all the logic, right? And then software 2.0 by definition is you have a machine learning model that learns from the data. So you have your software's logic partially dictated by a machine learning model that maybe makes recommendations, certain automation.

So that encompass a large amount of the logic that this program will actually operate. And then the Challenge for software 2.0 is like managing data, managing that training workflow and then managing that model's monitoring and all that.

And then when we talk about software 3.0 it's like well things are partially built by LLM and then also invoking LLM as part of the core logic. Traditionally you have like a backend API. What if you don't write the backend APIs actual code? You just have initial language saying well this API should load this person and then find their past 10 purchases and so on.

Dori: And really creating something non deterministic. I think that's for me when I think about 2.0 versus 3.0 you know, not necessarily all of the models themselves are deterministic, but when we talk about LLMs, you're not going to get the same output no matter if you put the same input in over and over again. And that's something when I've thought about data models is one of the scariest things to think about.

CL: That is exactly right. And then combined with agents like yeah sure there is a way for self -correction and then you impose some guardrails and then I think it's now safe to say that with a proper agentic loop, like the sufficiently advanced model can achieve the goal eventually. But it's depending on...

Dori: What is the eventually? How long is eventually?

CL: Yeah. And the collateral damage in between there.

Dori: What do we break along the way?

CL: Yeah. So the challenge now becomes okay, how do you define correctness and then have it in a way that you can trust the machine to evaluate, but also have the human in the loop to evaluate that result being correct?

Dori: How do you see the interplay between. A lot of the conversations I've seen at the various conferences we've been attending have been LLMs to evaluate LLMs. Where would you see the human fitting in that? What's the relationship?

CL: So LLM as a judge is pretty common because it's not perfect but it gives you a good baseline. Right? That without a lot of experts involved in that. A couple of companies are doing fine tuned LLMS as a judge that maybe the, the human gives some judgment for a couple of times and then the fine tuning loop learns that how a judge can be more effective.

But I think this is still emerging and then there are probably going to be new practices. But one of the things I'm seeing is that in the data space people often talk about yeah we need to have the software engineering practice into the data workflow. Right?

And then what does that mean? A data workflow is more experimental. You do a lot of exploration before you actually commit code or say that this is final code.

Dori: Yeah. Because you're building an understanding. And how can you code it if you don't understand what you're coding?

CL: Yeah, exactly. And then so that's been a constant debate for the past couple years whether data engineering is a subset of software engineering or a completely different practice. But what I found most intriguing is that the new type of software people are building is actually more experimental. So you are not caught up to a certain PRD anymore.

Dori: Mhm.

CL: You are experimenting how the software would behave. And then a lot of times you cannot, like traditional software create exhaustive tests to make sure it behaves in a certain way, but it's good enough to start with.

And then that creates a very interesting question. What does the development cycle look like? But more fundamentally, the shelf life for software may be very different.

Dori: Oh yeah. Well, I mean, there's a lot of discussion, right? I mean you and I were talking about this earlier today about a new model gets released. What happens? You have GPT4 now, GPT5 is out. N ow you have to learn all over a new set of vibes. How is this going to impact the workflows that you're having there? How are you going to evaluate it?

CL: Yeah. So for example, if you have a simple prompt or an agentic workflow that you are now switching the backend model, now of course it will behave slightly or massively different. Right.

Dori: Essentially your agent, your intern has redone all of their knowledge.

CL: Exactly. And then now maybe you need to tune some of those prompts. But before doing that, how do you know which part of the task they're assigned to is now better or worse?

Traditionally, software relies a lot on code review, where a lot of your peer is going to nitpick your code. Maybe not just for style, but whether it covers edge cases or is a good architecture for the desired workload. But for this type of new software, that code review doesn't exist anymore. Because you don't critique other people's prompts. Maybe it's better for the next model. You don't know. So our thesis is really the future is code review is going to be very much like data review.

You care more about the input and output pair and then how that underlying change of either prompt or model or the tool use or whatever you put the LLM up to is now affected after you change the logic or the prompt. How is the input and output mapping? So that actually maps to a data problem where you have a bunch of input, you have a bunch of transformation or maybe if you instrument every tool used in between, how the end result came by all those kind of internal logic.

So it creates, it's almost like a data pipeline. Right?

Dori: I mean, I would say it is. Right? I mean it's just a different type of data. A lot of times people think about data as numbers, but it's really about information and variables.

It's interesting talking with you and just working at Recce, which is, it's very much coming from that software engineer data that's getting into data engineering. That's translating that.

Let's talk a little bit more about from the data space into the data engineering where it starts being a little bit more software engineering. Right? Which granted, is more of my background coming from data scientist, self-taught data engineer. But I think it's a very different mindset. So I 'd just love to hear in your words how do you see that working?

CL: So I think in the community or like gov0 or other community or when I met you, like people from data background, it's definitely very different mindset. And then good ones really care about like the results. "What are we using this data for?" Not data for the data's sake. Right?

And then that's kind of a blind spot from a software engineering perspective because we usually, okay, when we work on the budget data or legislation data, we just want to make everything friendly, available for people with the domain knowledge can do analytics properly.

So we don't necessarily have that problem intention. So we have this urge to make this data very clean and accessible. So that's why the computer community is awesome that combines two types of people and then work out what the data can actually create value from.

Dori: Yeah, I mean to put it another way, it's like okay, the structures matter, the code matters. But on the other side of the hand when you're coming from a data background, it's really all about the data and what do I want to do with it?

It's not so much that if the code worked this way or that way, I don't particularly care as long as it's efficient and within my compute pipelines in giving me what I want. Right? That's what I need. What I want being either helping my stakeholder answer a question, helping us decide product direction.

Maybe it is the product. Some companies write data as the product, making sure that that is clean, that it is something our customers expect and accurate.

CL: Absolutely. That's where I find people from data background. Curiosity is very key. What do you want to do with this data? What can this potentially answer?

Dori: Mhm.

CL: And then can we model the data or shape the data in a way to answer those questions?

Dori: Catch the corrupt politicians. Did they build the factory on the farmland?

CL: Something like that. Or they use shadow companies to combine donations above what's legally allowed and so on.

Dori: Yeah, yeah. In the US we don't need shadow corporations to do that.

We talked a lot about, okay, what does 3.0 look like? What does it mean? How would you categorize that? Like, what would you call this age? We talked about, like, for example, 2.0 was DevOps. Like, what is this age? Like, what is the category that needs to be created or is being leaned more into?

CL:

I feel this is a super weird time. And it's about to get even more weird. Right?

And traditionally, when we learn computer science, and data structure, and algorithm, there's a saying, "okay, program equals data structure plus algorithm," right? So you have a way to represent your data, and then you have a logic, that process that represented data. And then, boom, you put all those data in. Then you have a program, right?

And then software 2.0 is a little bit weird because you don't actually have that code. That is the runtime, as in there's definitely still code doing training, right. But in execution time you are processing that model.

So that blurs the boundary between code and data. And then 3.0 is even more weird, right, because that natural language do a lot of magic.

Dori: Literally a black box, so no one knows the model behind it.

CL: And then here's the thing, when we do a programming language, you treat that code as part of the data, and then you turn that data, which is the source code, into something executable. And then just imagine that is now, okay, you don't need specific DSL that represents a program, that naturally language is a program.

And then now the relationship between that program and what we expect that program to work is now more defined by data. Right? These are the 1000 golden data entries that I care about. Their behavior better be at least this.

Dori: So there's not yet a category. It sounds like it's still being defined. It's more of the software engineers are having to move more into the data mindset of the output and understanding the context of what they're building.

And the data engineers are having to move a little bit more into the software engineering mindset, to think a little bit more about structure and how to have a little bit more rigor in the things that they're building and when they build it.

CL: Yeah, at the end of the day.

Dori: But not yet a word. Not yet a word.

CL: No, I think it's definitely converging in the way that the reason is that... So for purely data you're mostly you need to answer questions. But when we talk about data engineering it's really like how do you effectively and repeatedly answer similar questions?

So you have to build out that infrastructure or representation about...

Dori: Productionize, answering the questions.

CL: Yeah. So there is that engineering practice that needs to be put in. It's not no longer ad hoc, but maybe enabling ad hoc. But you still have a core foundation of what knowledge is represented within the organization.

Dori: So who at a company do you think with what we're talking about with software 3.0 and this intersection right now is either should be building this way, should start be thinking this way or is thinking this way like what job title role or is it just a skill set?

CL: I feel there's a lot of discussion on how teams are able to build a lot faster the traditional EPD meaning "engineering, product and design" combined into a single person can do a lot. Of course they're probably not experts in every field, but if they understand a good chunk of that what good looks like for the other principles, then they can be super productive.

Dori: You can get to the "good enough."

CL: Yeah. So I think this is a super interesting time as in coming from my background and competitive programming. We study algorithms in high school and then it already blew my mind when people were talking about how social media is controlling us through algorithms. Non-computer science people talk about algorithms. It's kind of weird enough in about 10 years ago.

Dori: Oh yeah, yeah, we gotta feed daddy algo. Yep, yep.

CL: Something along the line of vibe coding. It's just general public can now do this and talk about this. You know it's super interesting.

Dori: Oh yeah, I was talking with one of my cousins, she's a VP of data science at a mid-sized company. I think they do biotech. She was saying how she built out an entire workflow herself. The workflow being they used to have to be on top of their vendors. If their vendors fell behind a certain quota they'd have to have a meeting and send them a warning. You know all of this stuff.

She has built out herself doesn't have any coding experience. Right. She has an MBA. Right. This is the type of background we're talking about that now has automated that entire process of figuring out oh the vendor's underperforming, sending the vendor an email, sending the warning and if they don't respond appropriately, doing a follow up or shifting their supply chain.

Right? That's something that was not possible at all two years ago, three years ago.

CL: It is definitely enabling a lot. My wife is an author and she wants to track how her book is on the charts. And then so one day she was vibe coding a data collector and then put the cron job into a GitHub Action. I asked, "what are you doing?"

And she's like, "I have no idea. The LLM told me to open this file and then just put on all this weird things and it seems to work."

Dori: Oh yeah, yeah, yeah. Which is at the most basic level getting at that, "so what" that we've been talking about.

Okay. As we think through, like we've talked about. All right. How things are going today, democratizing that workflow, to use a overused word, but it does fit here, especially with the gov0, just to pull that back in. Where do you think we're going to be in five years?

CL: So.

Many people mention that this is the most normal time that we should remember because things are going to be even more strange.

Right. And then when you think about that enabling, the scale of that enabling, people can build things that previously would take a team that there are a couple of people mentioned about a personal software. I'll just build out the thing. I don't need to even buy a SaaS or compare different options.

And then that could work for myself my friends and very niche things. Maybe you want to have a pickable booking system that you just build.

Dori: Hyper personalized software.

CL: Exactly. But what I'm still thinking is that even though that might not require such a production grade thing compared to the thing you want to serve a million users, what are the basic testing that we can help the next generations of builders use and leverage and then be confident about the things they build and ship, even though it's just shipping to themselves.

Dori: Yeah. So my cousin in five years time would be able to do her own evals on these pipelines she doesn't understand. I mean it's interesting because I think on one hand there's this school of thought about what you're saying, like this is the most normal time we're going to live in and it's just going to get crazier and we'll be more hyper personalized.

But on the other hand, and I'm curious your thoughts here, we've also seen this massive consolidation of companies. How do you see potentially that working out this interplay between, "oh yeah, we can have everything personalized"versus no one really wants to work, they want something ready made. And the industry itself is starting to move more and more towards this consolidation.

CL: It's a very broad question. There are a couple of thoughts or threads I can think of. So for personal software, it's really what's the effort for me to get something custom built that I can be confident about compared to all my efforts, figuring out what people are saying about my choices. Right?

And then like Finance Tracker or everything. There's so many options and then what if it's easier to build one and then just self-correct itself?

Dori: Finance Tracker is such a good one. That's something I've been really wanting a better version of recently.

CL: Yeah.

So I think that tipping point would be that for a majority of use cases, people start to find it's easier and cheaper. It's like ordering a breakfast, like ordering a piece of software that works most of the time for them.

And then what might be interesting is what abstraction that type of software brings. What I mean by that is, does the software separate logic and data, to the extent that the user can be confident that there might be something wrong with the logic? But if the data is correct, we can correct that.

So what type of software being built by this hyper personalized software can instill that confidence of the user? So I think that would be a key thing. And then when you say like consolidation, definitely this is more traditional SaaS or enterprise software.

It's always a pendulum, like people wanted to buy one shop thing and then the in the time where there's a lot of shift in best practices or new demand or new ways of doing things and there are niche players starting from individual sectors. So I think it's a continuous process where new things brought out and then got consolidated and then that's where we see so many interesting innovations.

Dori: Yeah. Okay. CL, so we've talked a lot today. It ended up being me interviewing you a little bit. I hope that our listeners have enjoyed us on this journey. You know, for our final couple minutes, why don't you talk a little bit about why we're doing this.

CL: For the podcast, right? It's really like the industry improves in an interesting nonlinear way. And then the best step function is really usually important. abstraction or a toolkit gets created that reshape how people think about the problem.

Dori: Yeah.

CL: Because now we all know LLM has limited context and so do human. So when we have abstraction, we hold fewer concepts in our head. That means we can create more complex things on top of that abstraction.

Dori: Get more creative, have more energy and space.

CL: You have bigger building blocks to, to build things. And then if they're well designed and flexible you end up able to build things with less.

And so our goal is really to interview builders and practitioners that think along this way, creating critical parts of how we deal with our data and software layer these days.

And then talk through their journey and then how. So usually you start off something to scratch your own itch, and then sometimes it's pretty unexpected. You change the whole industry.

And then I just enjoy those conversations and then getting people to know more about this kind of hidden transformations in the industry.

Dori: Yeah. Well, I also think for a lot of people that aren't necessarily in the Bay Area, these are a lot of conversations that just being here, we get to have, as part of the energy and living in this tech bubble.

So being able to share this energy, the thoughts and the discussions that we get to have here, you know, our bubble, both as early founder community and data space and San Francisco, I think it's really exciting and hopefully, you know, I expect people to learn a lot. I expect personally to learn a lot. And it's really exciting.

CL: Yes, definitely.

Dori: Okay. We'll see if this ends up being the official outro, but at least for this first one, some initial questions. It's just going to be rapid fire.

CL: Yes.

Dori: First programming language you loved.

CL: First programming I love is probably Perl.

Dori: Ooh. Yep. Early, early. Yep, yep, yep. What's the nerdiest thing you've automated in your life? What's the agent going to take over?

CL: There was a couple of years I was into home automation and then there was like a temperature sensor and ceiling fan and then dehumidifier and all that. This is why we got all that wired up. Yeah, that was a couple years ago and it was a good, fun project.

Dori: So hardware automation. Got it. Got it. If you weren't a founder, what would you be doing?

CL: I don't know. I can't think of anything.

Dori: Yep, perfect answer. Perfect Answer. Our investors will love that. Tabs or spaces?

CL: Oh, space.

Dori: Oh, okay. Don't look at my code. I'm a tab person. All right, and last question. What is your hottest take?

CL: Okay, so this could be a little bit philosophical. So when I worked on version control systems and then realized that the innovation that Git brought was actually the content-addressable storage, which means that each commit has a hash of the commit message, the actual content of the code, and the previous version of everything.

And so it's super elegant. And then that creates an interesting perspective that, what if all this potential code that can be written already existed that we just don't know yet?

It's like finding out those hashes, like what code tree and message hash into that? Of course, this is something super weird. But if you think about, there are a lot of great writers, they talk about their process of writing. And then myself as a programmer, when I create something, oftentimes there is weird moment that you just have some inspiration and do things.

And then you can't help but think, like, okay, if you concentrate, and then there's however you want to name it, it's the quantum soup or whatever, that's giving you the idea for making this work.

So if you think about the way that all the potential code is already there, we just need to uncover that. And then it's, of course, really weird because it hasn't happened yet. But if we think about above that time dimension, and then that's our mission is to define what code that we should produce or uncover.

Dori: Like a universal tree of code.

CL: Yeah.

Dori: I feel like we need an edible for that answer. I'm for it.

Awesome. Okay, well, that wraps up our first episode. I hope you all enjoyed. Much more to come. We have an exciting guest lineup.

CL: Yeah. Until then, keep building.

Dori: Keep building.

Content from the Library

Visit library

Nov 25, 2025

Podcast

Data Renegades Ep. #3, Building Tools That Shape Data with Maxime Beauchemin

On episode 3 of Data Renegades, CL Kao and Dori Wilson sit down with Maxime Beauchemin. They explore the origins of Airflow and...

Oct 7, 2025

Podcast

Platform Builders Ep. #15, Everything’s Recorded Whether You Know It or Not with Amrit Dhangal

On episode 15 of Platform Builders, Christine Spang and Isaac Nassimi sit down with Amrit Dhangal. Together, they explore his...

Aug 21, 2025

Podcast

Open Source Ready Ep. #20, Exploring AI Memory with Vasilije Markovic of Cognee

In episode 20 of Open Source Ready, Brian and John sit down with data engineering and cognitive science expert Vasilije Markovic...